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Abstract 

Motivated by computer networks and machine-to-machine communication applications, a bidirec- 
tional link is studied in which two nodes, Node 1 and Node 2, communicate to fulfill generally conflicting 
informational requirements. Node 2 is able to acquire information from the environment, e.g., via access 
to a remote data base or via sensing. Information acquisition is expensive in terms of system resources, 
e.g., time, bandwidth and energy and thus should be done efficiently by adapting the acquisition process 
to the needs of the application. As a result of the forward communication from Node 1 to Node 2, 
the latter wishes to compute some function, such as a suitable average, of the data available at Node 
1 and of the data obtained from the environment. The forward link is also used by Node 1 to query 
Node 2 with the aim of retrieving suitable information from the environment on the backward link. The 
problem is formulated in the context of multi-terminal rate-distortion theory and the optimal trade-off 
between communication rates, distortions of the information produced at the two nodes and costs for 
information acquisition at Node 2 is derived. The issue of robustness to possible malfunctioning of the 
data acquisition process at Node 2 is also investigated. The results are illustrated via an example that 
demonstrates the different roles played by the forward communication, namely data exchange, query 
and control. 

Index Terms 

Source coding, side information, interactive communication. 
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I. Introduction 

In computer networks and machine-to-machine links, communication is often interactive and 
serves a number of integrated functions, such as data exchange, query and control. As an 
exemplifying example, consider the set-up in Fig. 1 in which the terminals labeled Node 1 
and Node 2 communicate on bidirectional links. Node 2 has access to a data base or, more 
generally, is able to acquire information from the environment, e.g., through sensors. As a result 
of the communication on the forward link, Node 2 wishes to compute some function, e.g., a 
suitable average, of the data available at Node 1 and of the information retrievable from the 
environment. Instead, Node 1 queries Node 2 on the forward link with the aim of retreiving 
some information from the environment through the backward link. 




Node 1 Node 2 

Figure 1. Two-way communication with adaptive data acquisition. 

Information acquisition from the environment is generally expensive in terms of system 
resources, e.g., time, bandwidth or energy. For instance, accessing a remote data base requires 
interfacing with a server by following the appropriate protocol, and activating sensors entails 
some energy expenditure. Therefore, data acquisition by Node 2 should be performed efficiently 
by adapting to the informational requirements of Node 1 and Node 2. 

To summarize the discussion above, in the system of Fig. 1 the forward communication from 
Node 1 to Node 2 serves three integrated purposes: i) Data exchange: Node 1 provides Node 2 
with the information necessary for the latter to compute the desired quantities; if) Query: Node 
1 informs Node 2 about its own informational requirements, to be met via the backward link; 
in) Control: Node 1 instructs Node 2 on the most effective way to perform data acquisition from 
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the environment in order to satisfy Node l's query and to allow Node 2 to perform the desired 
computation. 

This work sets out to analyze the setting in Fig. 1 from a fundamental theoretical standpoint 
via information theory. Specifically, the problem is formulated within the context of network 
rate-distortion theory, and the optimal communication strategy, involving the elements of data 
exchange, query and control, is identified. Examples are worked out to illustrate the relevance of 
the developed theory. Finally, the issue of robustness is tackled by assuming that, unbeknownst 
to Node 1, Node 2 may be unable to acquire information from the environment, due, e.g., to 
energy shortages or malfunctioning. The optimal robust strategy is derived and the examples 
extended to account for this generalized model. 

A. Related Work 

The work in this paper builds on the long line of research within network information 
theory that deals with source coding with side information (see, e.g., 0]| for an introduction). 
More specifically, we adopt the model of a side information "vending machine" that has been 
introduced in [2J. This model accounts for source coding scenarios in which acquiring informa- 
tion at the receiver entails some cost and thus should be done efficiently. Specifically, in this 
model, the quality of the side information Y can be controlled at the decoder by selecting an 
action A that affects the effective channel between the source X and the side information Y 
through a conditional distribution Py\x,a(v\x, a). The distribution Py\x,a{v\x, a) defines the side 
information "vending machine" as per the nomenclature of [2]. Each action A is associated with 
a cost, and the problem is that of characterizing the available trade-offs among rate, distortion 
and action cost. We emphasize the conventional formulation of the source coding problem with 
side information instead assumes that the relationship between source and side information is 
determined by a given conditional distribution Py\x(u\x) that cannot be controlled. 

Various works have extended the results in [2]. Extensions to multi-terminal models can be 
found in [3J. Specifically, references 0-flH considered a set-up analogous to the Heegard-Berger 
problem ifTOl . |fTTfl. in which the side information may or may not be available at the decoder. 
In 13, a distributed source coding setting that generalizes lfT2l to the case of a decoder with a 
side information "vending machine" is investigated. Multi-hop models were studied in flSl . 
In 0, a related problem is considered in which the sequence to be compressed is dependent 
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Figure 2. Two-way source coding with a side information vending machine at Node 2. 



on the actions taken by a separate encoder. Other extensions include jH, where the model 
of is revisited under the additional constraints of common reconstruction [fT3l or of secrecy 
with respect to an "eavesdropping" node. 

In this paper, the model of a side information "vending machine" is used to model the 
information acquisition process at Node 2 in Fig. [Q Unlike [2 J and the previous work discussed 
above, communication between Node 1 and Node 2 is assumed to be bidirectional. The problem 
of characterizing the rate-distortion region for a two-way source coding models, with conventional 
action-independent side information sequences at Node 2 has been addressed in |[T4||. lfl"5l . lfT8l 
and references therein. 

B. Contributions and Organization of the Paper 

This work studies the model in Fig. 1, which is detailed in terms of a block diagram in Fig. 
2. The system model is introduced in Sec. [nj The optimal trade-off between the rates of the 
bidirectional communication, the distortions of the reconstructions of the desired quantities at 
the two nodes, and the budget for information acquisition at Node 2 is derived in Sec. Hill An 
example that illustrates the application of the developed theory is discussed in Sec. [IV] Finally, 
in Sec. |Vj the results are extended to the scenario in Fig. [5] in which, unbeknownst to Node 1, 
Node 2 may be unable to perform information acquisition. 

Notation: Throughout the paper, a random variable is denoted by an upper case letter (e.g., 
X, Y, Z) and its realization is denoted by a lower case letter (e.g., x, y, z). Moreover, the shorthand 
notation X n is used to denote the tuple (or the column vector) of random variables . . . , X n ), 
and x 11 is used to denote a realization. We define [a, b] = [a, a + 1, b] for a < b and [a, b] = 0, 
otherwise. We say that X — Y — Z forms a Markov chain if p(x,y, z) = p(x)p(y\x)p(z\y), that 
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is, if X and Z are conditionally independent of each other given Y. 



II. System Model 



The two-way source coding problem of interest, sketched in Fig. [2J is formally defined by 
the probability mass functions (pmfs) px(%) and PY\Ax(y\ a , x ), an d by the discrete alphabets 
X, y, A, Xi, X 2 , along with distortion and cost metrics to be discussed below. The source 
sequence X n = (X 1 , ...,X n ) 6 X n consists of n independent and identically distributed (i.i.d.) 
entries Xi for i e [l,n] with pmf px{x). Node 1 measures sequence X n and encodes it in a 
message M\ of nR\ bits, which is delivered to Node 2. Node 2 wishes to estimate a sequence 
X% G X£ within given distortion requirements. To this end, Node 2 receives message Mi and 
based on this, it selects an action sequence A n , where A n E A n . 

The action sequence affects the quality of the measurement Y n of sequence X n obtained at 
the Node 2. Specifically, given A n and X n , the sequence Y n is distributed as p(y n \a n ,x n ) = 
JXi=iPY\A,x{yi\a'i- l Xi). The cost of the action sequence is defined by a cost function A: A —>[0, A max ] 
with < A max < oo, as A(a n ) = XT=i^( a *)- The estimated sequence XV; with XV; e X% is 
then obtained as a function of M\ and Y n . 

Upon reception on the forward link, Node 2 maps the message M\ received from Node 1 
and the locally available sequence Y n in a message M 2 of nR^ bits, which is delivered back 
to Node 1. Node 1 estimates a sequence X™ e X™ as a function of M 2 and X n within given 
distortion requirements. 

The quality of the estimated sequence X™ is assessed in terms of the distortion metrics 
dj(x,y,Xj): X x y x Xj — > K + U {oo} for j = 1,2, respectively. Note that this implies that X™ 
is allowed to be a lossy version of any function of the source and side information sequences. A 
more general model is studied in Sec. IIII-Al It is assumed that Dj = min a g ^ E[d(X, Y, Xj)] < 
oo for j = 1, 2. A formal description of the operations at encoder and decoder follows. 

Definition 1. An (n, Ri, R 2 , -Di, D 2 ,T,e) code for the set-up of Fig. [2] consists of a source 
encoder for Node 1 



gi: A-^[l,2 



(1) 



which maps the sequence X n into a message Mi; an "action" function 



t [1,2 



] x y- 1 -> a, 



(2) 
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which maps the message M\ and the previously observed into an action sequence A n ; a source 
encoder for Node 2 

g 2 : y n x [l,2 nRl ] -> [l,2 nBa ], (3) 
which maps the sequence F n and message Mi into a message M 2 ; two decoders, namely 

hi: [l,2 ni * 2 ] xT4 A^", (4) 
which maps the message M 2 and the sequence X n into the estimated sequence X"; 

h 2 : [1, 2 ni?1 ] xf 4 (5) 

which maps the message M\ and the sequence Y n into the estimated sequence X% i sucn mat 
the action cost constraint T and distortion constraints ZX, for j = 1,2 are satisfied, i.e., 

-]T> [A(Ai)] < r (6) 

i=l 

1 n 

and -^E [d,(X 4 , Fj, X^)] < ^ for j = 1, 2. (7) 



Definition 2. Given a distortion-cost tuple (D^, D 2 ,T), a rate tuple (R\, R 2 ) is said to be 
achievable if, for any e > 0, and sufficiently large n, there exists a (n, R 1 , R 2 , D 1 +e, D 2 +e, T+e) 
code. 

Definition 3. The rate-distortion-cost region 1Z(D\, D 2 ,T) is defined as the closure of all rate 
tuples (Ri, R 2 ) that are achievable given the distortion-cost tuple (D t , D 2 ,T). 

Remark 1. For the special case in which the side information Y independent of the action A 
given X, i.e., for p(y\a,x) = p(y\x), the rate-distortion region 1Z(Di, D 2 , T) has been derived 
in 031. Instead, if D 2 = D 2imax , the set of all achievable rates Ri was characterized in 0. 

Remark 2. The definition © of an action encoder allows for adaptation of the actions to the 
previously observed values of the side information Y. This possibility was studied in lfT6ll for 
the point-to-point one-way model, which is obtained by setting R 2 = in the setting of Fig. [21 

In the following sections, for simplicity of notation, we drop the subscripts from the definition 
of the pmfs, thus identifying a pmf by its argument. 
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III. Rate-Distortion-Cost Region 
In this section, a single-letter characterization of the rate-distortion-cost region is derived. 

Proposition 1. The rate-distortion-cost region 1Z(Di, D 2 ,T) for the two-way source coding 
problem illustrated in Fig. [2] is given by the union of all rate pairs (Ri,R 2 ) that satisfy the 
conditions 

Ri > I(X;A) + I(X;U\A,Y) (8a) 
and #2 > I(Y;V\A,X,U), (8b) 

where the mutual information terms are evaluated with respect to the joint pmf 

p(x, y, a, u, v) = p(x)p(a, u\x)p(y\a, x)p(v\a, u, y), (9) 
for some pmfs p(a, u\x) and p(v\a, u, y) such that the inequalities 

E[di(x,y;fi(v;x))] < d u (ioa) 

E[d 2 (X,Y,HU,Y))\ < D 2 , (10b) 
and E[A(A)\ < T, (10c) 

are satisfied for some function f\: V x X — > X\ and i 2 : Wx]/-} X 2 . Finally, U and V are 
auxiliary random variables whose alphabet cardinality can be constrained as \U\ < |^||^4.| +4 
and |V| < + 1 without loss of optimality. 

Remark 3. For the special case in which the side information Y is independent of the action A 
given X, i.e., for p(y\a, x) = p(y\x), the rate-distortion region H(Di, D 2 ,T) in Proposition Q] 
reduces to that derived in |fi~4|. lfT51 . Instead, if D 2 = D 2 , max , the result reduces to that in 0. 

The proof of the converse is provided in Appendix A. The achievability follows as a combi- 
nation of the techniques proposed in [2] and |fl4]|. and requires the forward link to be used, in 
an integrated manner, for data exchange, query and control. Specifically, for the forward link, 
similar to ED, Node 1 uses a successive refinement codebook. Accordingly, the base layer is 
used by Node 1 to instruct Node 2 on which actions are best tailored to fulfill the informational 
requirements of both Node 1 and Node 2. This base layer thus represents control information 
that also serves the purpose of querying Node 2 in view of the backward communication. We 
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observe that Node 1 selects this base layer as a function of the source X n , thus allowing Node 
2 to adapt its actions for information acquisition to the current realization of the source X n . 
The refinement layer of the code used by Node 1 is leveraged, instead, to provide additional 
information to Node 2 in order to meet Node 2's distortion requirement. Node 2 then employs 
standard Wyner-Ziv coding (i.e., binning) Q] for the backward link to satisfy Node l's distortion 
requirement. 

We now briefly outline the main technical aspects of the achievability proof, since the details 
follow from standard arguments and do not require further elaboration here. To be more precise, 
Node 1 first maps sequence X n into the action sequence A n using the standard joint typicality 
criterion. This mapping requires a codebook of rate I(X; A) (see, e.g., HI pp. 62-63]). Given the 
sequence A n , the description of sequence X n is further refined through mapping to a sequence 
U n . This requires a codebook of size I(X; U\A, Y) for each action sequence A n using Wyner-Ziv 
binning with respect to side information Y n [1, pp. 62-63]. In the reverse link, Node 2 employs 
Wyner-Ziv coding for the sequence Y n by leveraging the side information X n available at Node 
1 and conditioned on the sequences U n and A n , which are known to both Node 1 and Node 2 as a 
result of the communication on the forward link. This requires a rate equal to the right-hand side 
of (f8bl) . Finally, Node 1 and Node 2 produce the estimates X™ and X% as the symbol-by- symbol 
functions Xu = fi(Vi,Xi) and X 2 i = f2(/7i, Yj) for i e [1, n], respectively. 

Remark 4. The achievability scheme discussed above uses actions that do not adapt to the previ- 
ous values of the side information Y. The fact that this scheme attains the optimal performance 
characterized in Proposition \T\ shows that, as demonstrated in lfT6l for the one-way model with 
R 2 = 0, adaptive actions do not improve the rate-distortion performance. 

A. Indirect Rate-Distortion-Cost Region 

In this section, we consider a more general model in which Node 1 observes only a noisy 
version of the source X n , as depicted in Fig. [3] Following ifTTl . we refer to this setting as posing 
an indirect source coding problem. The example studied in Sec. [IV] illustrates the relevance of 
this generalization. The system model is as defined in Sec. [II] with the following differences. 
The source encoder for Node 1 

gl :Z n ^[l,2 nR % (ID 
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Figure 3. Indirect two-way source coding with a side information vending machine at Node 2. 



maps the sequence Z n into a message Mi; the decoder for Node 1 

h x : [l,2 nR2 ] x Z n -» (12) 

maps the message M 2 and the sequence Z n into the estimated sequence X"; given (X n , A n , Z n ), 
the side information F n is distributed as p(y n \a n , x n , z n ) = YYi=iPY\A,x,z{yi\o>ii Zi) and the 
distortion constraints are given as 

1 n 

-^E \dj(Xi, Y i} Z { , X ri )j < Dj for j = 1, 2, (13) 



i=l 



for some distortion metrics dj(x,y, z,Xj) :X x y x Z x Xj — )■ R + U {cxo}, for j = 1,2. The 
next proposition derives a single-letter characterization of the rate-distortion-cost region. 

Proposition 2. The rate-distortion-cost region TZ(D 1 , D 2 ,T) for the indirect two-way source 
coding problem illustrated in Fig. \3\is given by the union of all rate pairs (Ri, R 2 ) that satisfy 
the conditions 

Ri > I(Z;A)+I(Z;U\A,Y) (14a) 
and # 2 > I(Y;V\A,Z,U), (14b) 
where the mutual information terms are evaluated with respect to the joint pmf 

p(x, y, z, a, u, v)=p(x, z)p(a,u\z)p(y\a,x ,z)p(v\a,u,y) , (15) 
for some pmfs p(a, u\x) and p(v\a, u, y) such that the inequalities 

E[d!(X,Y,ZMV,Z))] < A, (16a) 
E[d 2 (X,Y,Z,f 2 (U,Y))] < D 2 , (16b) 

andE[A(A)] < T, (16c) 
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are satisfied for some function fj/ V X Z — > X\ and i 2 '- IA x y — >■ A^. Finally, U and V are 
auxiliary random variables whose alphabet cardinality can be constrained as \U\ < \Z\\A\ +3 
an J |V| < |W||[y| I A\ + 1 without loss of optimality. 

The proof of the achievability and converse follows with slight modifications from that of 
Proposition \T\ Specifically, in the achievability the sequence X n is replaced by its noisy version, 
i.e., the sequence Z n , and the rest of the proof remains essentially unchanged. The proof of the 
converse is provided in Appendix A. 

IV. Example 

In this section, we consider a binary example for the set-up in Fig. [3] to illustrate the main 
aspects of the problem and the relevance of the theoretical results derived above. Specifically, we 
assume binary alphabets as X = A = {0, 1} and a source distribution X ~ Bern(0.5). Moreover, 
the source Z n measured by Node 1 is an erased version of the source X n with erasure probability 
e. This means that Zi = e, where e represents an erasure, with probability e and Z; L = Xt with 
probability 1 — e, for % E [l,n]. 

The vending machine at Node 2 operates as follows: 

fx for A = 1 
Y = I , (17) 

I <p for A = 

with cost constraint A(a) = a, for a e {0, 1}, where is a dummy symbol representing the case 
in which no useful information is acquired by Node 2. This model implies that a cost budget of 
T limits the average number of samples of the sequence Y that can be measured by Node 2 to 
around r?T given the constraint ©. 

Node 1 wishes to reconstruct a lossy version of the source X n , while Node 2 is interested in Z n . 
The distortion functions are the Hamming metrics di(x,Xi) = l{ x ^x x \ and d 2 (z,X2) = l{ z ^x 2 }- 
To obtain analytical insight into the rate-distortion-cost region, in the following we focus on a 
number of special cases. 

A. Di = Dx^max and D 2 = 

Consider the distortion requirements Di = D\ max and D 2 = 0. As a result, Node 1 requires 
no backward communication from Node 2, while Node 2 wishes to recover Z n losslessly. For 
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the given distortions, the rate-cost region in Proposition [2] can be evaluated as 



Ri > 



H 2 (e) + (l-e-T) 



+ 



(18a) 



and R 2 > 







(18b) 



for any cost budget T > 0, where H 2 (a) = — alog 2 a; — (1 — a)log 2 (l — a) is the binary entropy 
function. 

A formal proof of this result can be found in Appendix B. The rate region (fl"8~l) shows that, 
as the cost budget T for information acquisition increases, the required rate i? x decreases down 
to the rate H 2 (e) that is required to describe only the erasures process E n with E { = l{^. =e }, 
i = 1, n, losslessly to Node 2. This can be explained by noting that the following time-sharing 
strategy achieves region ([TBI and is thus optimal. 

Node 1 describes the process E n losslessly to Node 2 with H 2 (e) bits per symbol. In order 
to obtain a lossless reconstruction of Z n , Node 2 needs to be informed about Zi = X { for all 
i in which E^ = 0. This information can be interpreted as control data that is used by Node 2 
to adapt its information acquisition process. Note that we have around n(l — e) such samples 
of Zi. Node 1 describes Zi = Xi for n{l — e — Y) + of these samples, while the remaining 
nmin(r, 1 — e) are measured by Node 2 through the vending machine. An alternative strategy 
based directly on Proposition [2] can be found in Appendix B. 

Fig. |4] illustrates the rate Ri in (I18al) versus the cost budget T for e = 0.2. We observe that 
if T > 1 — e = 0.8 no further improvement of the rate is possible as per (I18al) . 

B. D 1 = and D 2 = D 2)inax 

Here we consider the dual case in which Node 1 wishes to reconstruct sequence X n losslessly 
(Di = 0), while Node 2 does not have any distortion requirements (D 2 = D 2 max ). As shown in 
Appendix B, if V > e, the rate-cost region is given by the union of all rate pairs R 2 ) such 
that 



Moreover, for T < e, the region is empty as the lossless reconstruction of X at Node 1 is not 
feasible. 




(19a) 



and R 2 > e. 



(19b) 
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A proof of this result based on Proposition [2] can be found in Appendix B. In the following, 
we argue that a natural time-sharing strategy, akin to that used for the case Di = D l max , D 2 = 
above, would be suboptimal, implying that the optimal strategy requires a more sophisticated 
approach based on the successive refinement code presented in Sec. Hill 

A natural time-sharing strategy would be the following. Node 1 describes nr/ samples of the 
erasure process E n , for some < 77 < 1, losslessly to Node 2, using rate R\ = r\H 2 (e). This 
information is used by Node 1 to query Node 2 about the desired information. Specifically, Node 
2 sets Ai = 1 if Ei = 1, thus observing around nr]e samples Yi = Xj from the vending machine. 
These samples are needed to fulfill the distortion requirements of Node 1 . For all the remaining 
n(l — 77) samples, for which Node 2 does not have control information from Node 1, Node 2 
sets Ai — 1, thus acquiring all the side information samples. Again, this is necessary given Node 
l's requirements. Node 2 conveys losslessly the nr]e samples Yi = Xi obtained when Ei = 1, 
which requires r]t bits per sample, along with the n{l — 77) samples Yi in the second set, which 
amount instead to (1 — r])H{X\Z) bits per sample. Note that we have the rate H(X\Z) by the 
Slepian-Wolf theorem 0], Chapter 10], since Node 1 has side information Zi for the second set 
of samples. Overall, we have R 2 = rje + (1 — r))e = e bits/source symbol. This entails a cost 
budget of T = rje + 1 — rj, and thus 77 = ( 1_r )/(i-e). 

Fig. H] compares the rate i?i as in (|19a|) with the corresponding rate obtained via time-sharing, 
for e = 0.2. As seen, in this second case the time-sharing strategy is strictly suboptimal. 

C. Dx = D 2 = 

We now consider the case in which both nodes wish to achieve lossless reconstruction, i.e., 
D\ = D 2 = 0. As seen in the previous case, achieving D\ = is not possible if T < e and thus 
this is a fortiori true for Di = D 2 = 0. For T > e, the rate-cost region is given by 



as shown in Appendix B. 

A time-sharing strategy that achieves (|20l) is as follows. Node 1 describes the process E n 
losslessly to Node 2 with H 2 (e) bits per symbol. This information serves the functions of query 
and control for Node 2. In order to satisfy its distortion requirement, Node 2 now needs to be 



Ri > H 2 (e) + (1-T) 



(20a) 



and R 2 > e, 



(20b) 
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Figure 4. Rate Ri versus cost V for the examples in Sec. IIVI with e = 0.2. 



informed about Z, = Xi for all i in which E{ = 0. Note that we have n(l — e) such samples of 
Zi. Node 1 describes Zi = Xi for n(l — T) < n(l — e) of these samples, while the remaining 
n(r — e) are measured by Node 2 through the vending machine. Node 2 compresses losslessly 
the sequence of around ne samples of Xi with i such that Ei — 1 which requires R 2 = e bits 
per sample. 

Fig. H] illustrates the rate -Ri in (|20a|) versus the cost budget T for e = 0.2. 

V. When the Side Information May Be Absent 

In this section, we generalize the results of the previous section to the scenario in Fig. [5] in 
which, unbeknownst to Node 1, Node 2 may be unable to perform information acquisition due, 
e.g., to energy shortage or malfunctioning. This set-up is illustrated in Fig. [51 

A. System Model 

The formal description of an (n, Ri, R2, Z?i, D 2 , D 3 , T, e) code for the set-up of Fig. \5\ is 
given as in Sec. IIII-AI (which generalizes the model in Sec. |n]) with the addition of Node 3. 
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This added node, which has no access to the side information, models the situation in which the 
recipient of the communication from Node 1 happens to be unable to acquire information from 
the environment. Note that the same message M 1 from Node 1 is received by both Node 2 and 
Node 3. This captures the fact that the information about whether or not the recipient is able to 
access the side information is not available to Node 1. The model in Fig. [5] is a generalization 
of the so called Heegard-Berger problem ifTOl . |fTTfl. 
Formally, Node 3 is defined by the decoding function 

h 3 : [1,2"^]^^, (21) 

which maps the message Mi into the the estimated sequence X% ; and the additional distortion 
constraint 



n L — 4 L 



i=l 



(22) 



We remark that adding a link between Node 3 and Node 1 cannot improve the system perfor- 
mance given that Node 3 has only available the message Mi received from Node 1. Therefore, 
this link is not included in the model. 
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Figure 5. Indirect two-way source coding when the side information vending machine may be absent at the recipient of the 
message from Node 1. 



B. Rate-Distortion-Cost Region 

In this section, a single-letter characterization of the rate-distortion-cost region is derived for 
the set-up in Fig. [5l 
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Proposition 3. The rate-distortion-cost region 1Z(Di, D 2 , D 3 , T) for the two-way source coding 
problem illustrated in Fig. \5\ is given by the union of all rate pairs (Ri,R 2 ) that satisfy the 
conditions 

Rx > I{Z;A)+I(Z;X 3 \A) 

+I(Z;U\A,Y,X 3 ) (23a) 
andR 2 > I{Y;V\A,Z,U,X 3 ), (23b) 
where the mutual information terms are evaluated with respect to the joint pmf 

p(x, y, z, a, u, v) =p(x, z)p(a, u, x 3 \z)p(y\a, x, z) 

p(v\a,u,y,x 3 ), (24) 
for some pmfs p(a, u, x 3 \z) and p(v\a, u, y) such that the inequalities 

EfaiX^ZM^Z))] < D u (25a) 

E[d 2 (X,Y,Z,f 2 (U,Y))} < D 2 , (25b) 

E[d 3 (X,Y,Z,X 3 )} < D 3 , (25c) 

andE[A(A)] < T, (25d) 

are satisfied for some function f^: V x Z — > X\ and l 2 : U x y — >■ X 2 . Finally, U and V are 
auxiliary random variables whose alphabet cardinality can be constrained as \U\ < \Z\\A\ + 3 
and |V| < |W||y||^4| + 1 without loss of optimality. 

The proof of the converse is provided in Appendix A. The achievable rate (|23al) can be 
interpreted as follows. Node 1 uses a successive refinement code with three layers. The first 
layer is defined as for Sec. [HI] and carries query and control information. The second and third 
layers are designed as in the optimal Heegard-Berger scheme IfTOl . Specifically, the second layer 
is destined to both Node 2 and Node 3, while the third layer targets only Node 2, which has 
enhanced decoding capabilities due to the availability of side information. 

To provide further details, as for Proposition \T\ the encoder first maps the input sequence Z n 
into an action sequence A n so that the two sequences are jointly typical, which requires I(Z; A) 
bits/source sample. Then, it maps Z n into the estimate X 3 for Node 3 using a conditional 
codebook with rate I(Z; X 3 \A). Finally, it maps Z n into another sequence U n using the fact 
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Table I 

Erasure distortion for reconstruction at Node 3. 



that Node 2 has the action sequence A n , the estimate X% and the measurement Y n . Using 
conditional codebooks (with respect to X% and A n ) and from the Wyner-Ziv theorem, this 
requires 1{Z\ U\A,Y,X 3 ) bit/source sample. As for the rate (I23bl) . Node 2 employs Wyner-Ziv 
coding for the sequence Y n by leveraging the side information Z n available at Node 1 and 
conditioned on the sequences U n , A n and X% , which are known to both Node 1 and Node 2 
as a result of the forward communication. This requires a rate equal to the right-hand side of 
(I23bl) . Finally, Node 1 and Node 2 produce the estimates X" and X 2 as the symbol-by-symbol 
functions X u = fi(Vi, Z^) and X 2i = f 2 (Ui,Yi) for i £ [1, n], respectively. 

C. Example 

In this section, we extend the binary example of Sec. [IV] to the set-up in Fig. [51 Specifically, 
we consider the same setting as in Sec. [IV] with the addition of Node 3. For the latter, we 
assume a ternary reconstruction alphabet X 3 = {0, 1, *} and the distortion metric d 3 (x, z,x 3 ) = 
<i 3 (l{z=e}, x 3 ) in TableU where we recall that E{ = l{z 1=e } is the erasure process. Accordingly, 
Node 3 is interested in recovering the erasure process E n under an erasure distortion metric 
(see, e.g., |[T9l ) , where "*" represents the "don't care" or erasure reproduction symbol 

We first observe that for cases 1) and 3) in Sec. [IV] the distortion requirements of Node 3 do 
not change the rate-distortion function. This is because, as discussed in Sec. [TV] the requirement 
that D 2 be equal to zero entails that the erasure process E n be communicated losslessly to Node 
2 without leveraging the side information from the vending machine (which cannot provide 
information about the erasure process). It follows that one can achieve D 3 = at no additional 
rate cost. We thus now focus on the case 2) in Sec. [IV] namely D x = and D 2 = D 2y max- 

In the case at hand, Node 1 wishes to recover X n losslessly, Node 2 has no distortion 
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requirements and Node 3 wants to recover E n with distortion D 3 . As explained in Sec. |IV-B[ 
in order to reconstruct X n losslessly at Node 1 we must have r > e and Pi(A = 1\Z — e) = 1. 
Moreover, due to symmerty of the problem with respect to Z = and Z = 1, we can 
set Pi(A = 1\Z = 0) = Pr(A = 1\Z = 1) = Z=f, for some < 7 < T. To evalu- 
ate the rate-distortion-cost region (1231) . we then define Pr(X 3 = *\A = 1, Z = e) = pa, 
Pr(X 3 = *|A = 0,Z = 0) = p 2 and Pr(X 3 = *|A = 1,Z = 0) = p 3 . We thus get that 
the rate-distortion-cost region is given by 



Ha > iy 2 (e) + 1 - e - (1 - T)(l - pa) - (r - e) 
(1 - p 3 ) - (1 - T)p 2 - (epi + (r - e)p 3 



g,( -f , ) + ( ^- 6>P3 , ) (26a) 

and i? 2 > e, (26b) 

where parameters Pi,P2,P3 € [0, 1] must be selected so as to satisfy the distortion constraint of 
Node 3, namely D 3 > ep x + (1 - T)p 2 + {T - e)p 3 . 

Fig. [6] illustrates the rate Ri in (|26a| ), minimized over pa,P2 and p 3 under the constraints 
mentioned above versus the cost budget T for e = 0.2 and different values of D 3 , namely 
D 3 = 0.4, 0.6, 0.8 and D 3 = D 3jmax = 1. Note that for D 3 = -D 3)max = 1 we obtain the rate in 
( |19a| ). As it can be seen, for T < D 3 , the rate decreases with increasing cost T, but for T >D 3 
the rate remains constant while increasing T. The reason is that for the latter region, i.e., T > D 3 , 
the performance of the system is dominated by the distortion requirement of Node 3 and thus 
increasing the cost budget T does not improve the rate. Instead, for r < D 3 , it is sufficient to 
cater only to Node 2, and Node 3 is able to recover E with distortion D 3 = T at no additional 
rate cost. 

VI. Concluding Remarks 

For applications such as complex communication networks for cloud computing or machine- 
to-machine communication, the bits exchanged by two parties serve a number of integrated 
functions, including data transmission, control and query. In this work, we have considered a 
baseline two-way communication scenario that captures some of these aspects. The problem is 
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Figure 6. Rate Ri versus cost F for the examples in Sec. IV-Cl with e = 0.2, Di — and D2 = fl2,mm. 

addressed from a fundamental theoretical standpoint using an information theoretic formulation. 
The analysis reveals the structure of optimal communication strategies and can be applied to 
elaborate on specific examples, as illustrated in the paper. This work opens a number of possible 
avenues for future research, including the analysis of scenarios in which more than one round 
of interactive communication is possible [fT8l . 
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Appendix A: Converse Proof for Proposition Q] 

Here, we prove the converse part of Proposition [TJ For any (n, R±, R 2 , D\ + e, D 2 + e, T + e) 
code, we have the series of inequalities 

> H{M X ) 

( = } i{M 1 -,x n ,Y n ) 

= H(X n ) + H(Y n \X n ) 

- H(Y n \M x ) - H{X n \Y n , M x ) 

> J2 H ( x i) - # (*<l*m, yn , M ^ Ai ) 

i=l 

+ H(Y i \Y i -\X n , M 1? Ai) - HiYlY^ 1 , M 1? A t ) 

(c) 71 

yJ^HiX^-HiX^Y^U^+HiYlXM-HiYlAi), (27) 
i=i 

where (a) follows since Mi is a function of X n and since conditioning reduces entropy; (b) 
follows since A4 is a function of (Mi,F i_1 ) and Mi is a function of X™ and (c) follows since 
conditioning decreases entropy, by defining Ui = (Ml, XS_ X , A 1-1 , K 4-1 ) and using the fact that 
the vending machine is memory less. We also have the series of inequalities 

nR 2 > H(M 2 ) 

> H{M 2 \X n ,M x ) 

= I{M 2 )Y n \X n ,M x ) 

n 

= HtYilY*- 1 , X n , Mi, A*) 

i=l 

-H(Y l \Y i -\X n ,M 1 ,M 2 ,A i ) 

(c) n 

ij^HiYilXiAM-HiY^XiAi^yi), (28) 
1=1 

where (a) follows since M 2 is a function of (Mi,F n ), (6) follows since Ai is a function 
of (Afi,y <-1 ) and (c) follows since the Markov chain Y— (X u U u Ai)— X^ 1 holds by the 
problem definition (this can be easily checked by using d-separation on the Bayesian network 
representation of the joint distribution of the variables at hand as induced by the system model 
in Fig. [7J see, e.g., Il20l Sec. A.9]), since conditioning reduces entropy and by defining Vi = M 2 . 
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Figure 7. Bayesian network representing the joint pmf of variables (Mi, Ma, X" , Y n , A n ) for the two-way source coding 
problem with a vending machine in Fig. [2] 



Defining Q to be a random variable uniformly distributed over [1, n) and independent of all 
the other random variables and with X — Xq, Y = Yq, A = Aq, X\ = X\q, X 2 = X 2 q, 
V=(V q , Q) and U = (U Q , Q), from <27J we have 

nRi > H(X\Q) - H(X\A,Y,U,Q) 
+H{Y\X,A,Q)-H(Y\A,Q) 

(a) 

> H(X) - H(X\A,Y,U) 

+H(Y\X, A)-H{Y\A) 
= I(X;A) + I(X;U\A,Y), (29) 

where (a) follows by the fact that source X n and side information vending machine are mem- 
ory less and since conditioning decreases entropy. Next, from (1281) . we have 

nR 2 > H(Y\X,A,U) - H(Y\X,A,U,V) 

= I(Y;V\X,A,U). (30) 

Moreover, from Fig.[7]and using d-separation, it can be seen that Markov chains Ui — (Xj, AA — Yi 
and Vi — (Ai, Ui, Yi) — Xi hold. This implies that the random variables (X, Y, A, U, V) factorize 
as in ©. 
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We now need to show that the estimates Xi and X 2 can be taken to be functions of (V,X) 
and (U,Y), respectively. To this end, recall that, by the problem definition, the reconstruction 
Xu is a function of (M 2 ,X n ) and thus of (Xi, Ui, Vi,X % ~ 1 ). Moreover, we can take Xu to be 
a function of (Xi, Ui, VA only without loss of optimality, due to the Markov chain relationship 
Yi — (X^ Ui, Vi) — X t_1 , which can be again proved by d-separation using Fig. [7] This implies 
that the distortion d\(Xi,Yi,X\) cannot be reduced by including also X l ~ l in the functional 
dependence of X\ Similarly, the reconstruction X 2i is a function of (M\,Y n ) by the problem 
definition, and can be taken to be a function of (Ui, Yi) only without loss of optimality, since 
the Markov chain relationship Xi — (Yi, Ai, Ui) — Y^ holds. These arguments and the fact that 
the definition of V and U includes the time-sharing variable Q allow us to conclude that we can 
take X x to be a function of (U, V, X) and X 2 of (U, Y). We finally observe that V is arbitrarily 
correlated with U as per © and thus we can without loss of generality set Xi to be a function of 
(V, X) only. The bounds (flOl) follow immediately from the discussion above and the constraints 
©-©. 

To bound the cardinality of auxiliary random variable U, we observe that © factorizes as 

p(x, y, a, u, v) = p(u)p(a, x\u)p(y\a, x)p(v\a, u, y). (31) 

Therefore, for fixed p(y\a,x), p(a,u\x) and p(v\a,u,y) the characterization in Proposition Q] 
can be expressed in terms of integrals f gj(-)dF(u), for j = 1,..., \X\ \A\ + 3, of functions 
Qj(-) of the given fixed pmfs. Specifically, we have gj for j = 1, \X 2 \ — 1, given by 
p(a, x\u) for all values of x G X and a G A (except one); Pi^iH^i = H(X\A,Y,U = u); 
9\X!\\X2\+i = H Y ;V\ A ,X,U = u); g\ Xl \\x 2 \+2 = E[g?i(X, Y, i\(V, X))\U = u] and g\ Xl \\x 2 \+3 = 
E[d 2 (X, Y, i 2 (U, Y))\U = u). The proof is concluded by invoking Caratheodory Theorem. 
To bound the cardinality of auxiliary random variable V, we note that © can be factorized 

as 

p(x, y, a, u, v) = p(v)p(a, y, u\v)p(x\a, u, y), (32) 

so that, for fixed p(x\a,u,y), the characterization in Proposition \T\ can be expressed in terms 
of integrals f gj(p(a,u,y\v))dF(v), for j = 1, ...,\A\\U\\y\ + 1, of functions gj(-) that are 
continuous on the space of probabilities over alphabet |^4| x \U\ x |^| . Specifically, we have 
gj for j = 1, |^4||W||3^| — 1, given by p(a,u,y) for all values of a G A, u G U and y G y 
(except one); g\ A \\u\\y\ = H (Y\A, X, U, V = v); and g\ A \\u\\y\+i = E[di(X, Y, fi(V, X))\V = v). 
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The proof is concluded by invoking Fenchel-Eggleston-Caratheodory Theorem HI Appendix 
C]. 

The converse for Proposition [2] follows similar steps as above with the only difference that 
here we have 

(a) n 

nRi > ^2h(Z { ) - HiZil^Y^M^A*) 
i=i 

+ HfrlY 1 - 1 , Z n , M u Ai) - HfrlY*- 1 , M u At) 

(b) n 
i=l 

+H(Y i \Z i ,A i )-H(Y i \A i ), (33) 

where (a) follows follows as in (a)-(b) of (|27T) ; and (6) follows since Markov chain relationship 
Yi—(Zi, Ai)— (y* -1 , Z n ^, Mi) holds. The rest of the proof is as above. 



Appendix B: Proofs for the Example in Sec. 

1) Di = D l max and D 2 = 0: Here, we prove that the rate-cost region in Proposition [2] is 
given by (TT8T) for D\ = D^ max and D 2 = 0. We begin with the converse part. Starting from 
(I14al) . we have 

(a) 

i?i > I(A;Z)+H(Z\A,Y) 
= H(Z)-I(Z;Y\A) 

(b) 

> H(Z)-YI(Z]X\A=l) (34) 

> - rii(x| a = i) 

(d) 

> H(Z)-T 

= H 2 (e) + l-e-T, (35) 

where (a) follows from (|14al) and since Z has to be recovered losslessly at Node 2; (6) follows 
since Pr[v4 = 1] = E[A(A)] < T; (c) follows because entropy is non-negative; (d) follows since 
H(X\A = 1) < 1; and (e) follow because H(Z) = H 2 (e) + l—e. Achievability follows by setting 
U = Z, V = 0, Pi(A = 1\Z = 0) = Pi(A = 1\Z = 1) = r /(i-e) and Pi(A = 0\Z = e) = 1 in 
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2) D 1 = and D 2 = D 2jTnax : Here, we turn to the case D 1 = and D 2 = D 2max . We start 
with the converse. Since X is to be reconstructed losslessly at Node 1, we have the requirement 
H(X\V,Z) = from (|16al ). It easy to see that this requires that the equalities A = 1 and 

V = Y = X be met if Z = e. In fact, otherwise, X could not be a function of (V, Z) as 
required by the equality H(X\V, Z) = 0. The condition that A = 1 if Z = e requires that the 
pmf p(a\z) be such that Pr(A = 1\Z = e) = 1, which entails r = Pr[A = 1] > Px[Z = e] = e. 
Moreover, we can set Pi(A = 1\Z = 0) = Pi(A = 1\Z = 1) = ^~ e )/(i-e), for some < 7 < T, 
by leveraging the symmetry of the problem on the selection of the actions given Z = and 
Z = 1. Starting from (|14al) . we can thus write 

(a) 

i?i > I(Z;A) 

= H(Z)-H(Z\A) 
= H 2 (e) + l-e-'yH(Z\A=l) 
- (1- 1 )H(Z\A = 0) 
-£/ 2 (e) + 1 - e - 7ii I 

-(1-7) 

= # 2 (e)- 7 # 2 (-) 

7 

> F 2 (e)-riJ 2 (^), (36) 

where (a) follows from (1 14ab and since there is no distortion requirement at Node 2; (6) follows 
by direct calculation; and (c) follows since H 2 (e) — ^H 2 (^) is minimzed at 7 = T over all 
< 7 < r. 

The bound (|19bl) follows immediately by providing Node 2 with the sequence X n and then 
using the bound R 2 > H(X\Z) = e. 

Achievability follows by setting U = and the pmf p(a\z) be such that Pr(A — 1\Z — e) = 1 
and Pr(A = 1\Z = 0) = Pr(A = 1|Z = 1) = ^f. Moreover, let V = Y = X if Z = e and 

V = Y = (f) otherwise. Evaluating (fl4l) with these choices leads to (fT9l) . 

3) D 1 = D 2 = 0: Here, we prove the rate-cost region (|20l) for the case D x = D 2 = 0. Starting 
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from (|14al) , we have 

(a) 

Ri > H(Z)-TI(Z;X\A = 1) 

( = ] H(Z) - TH(X\A = 1) 

+TH(X\A = 1,Z = e)Pi(Z = e\A = 1) 

> H(Z)-T + T. f 

= H 2 (e) + 1-T, (37) 

where (a) follows as in ® ; (6) follows because F(X|A = 1, Z = 0) = H(X\A = 1,Z = 1) = 
0; (c) follows since #(X|A = 1) < 1, H(X\A = 1, Z = e) = 1 and because p(Z = e\A = 1) = 
|r, where latter follows from the the requirement H(X\V, Z) = as per discussion provided in 
the previous section. 

For the achievability, let U = Z, Pr(A = 1\Z = e) = 1 and Pi(A = 1\Z = 0) = Pi(A = 
1\Z = 1) = j5f> Moreover, let F = F = X if Z = e and V" = F = otherwise. Evaluating 
([14)) with these choices leads to < f20b . 

Appendix C: Converse Proof for Proposition [3] 

Here, we prove the converse part of Proposition [3] For any (n, Ri, R 2 , D± + e, _D 2 + e, D 3 + 
e, T + e) code, we have the series of inequalities 

nR 1 >H(M 1 ) 

(a) n 

> J2 H ( Z i) - H{Z i \Z". l ,Y n ,M 1 ,A i ,X 3i ) 
i=i 

+/J(F|F J - 1 ,X",M 1 A,X 3 ,)-i/(F|F t - 1 ,M 1 ,^,X3 i ) 

(6) " 
i=l 

-#(F|^,X 3J ), (38) 

where (a) follows from (a) in (|33l) by noting that X 3i is a function of M and (6) follows since 
conditioning decreases entropy, by defining Ui = (M 1 , X™ +1 , A^ 1 ,Y' l ~ 1 ) and using the Markov 
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chain relationship Yf— {Z h A h X 3i )— (V" 1 , X n \\ M x ). We also have the series of inequalities 

nR 2 > H(M 2 ) 

(a) n 

> J2 H (^\ z ^ A h u i^^- H ( Y i\ z h A ^ u i^^i), (39) 

i=l 

where (a) follows from (1281) . by replacing sequence X n with the sequence Z n and by observing 
that X 3i is a function of M\. Defining Q as in Appendix A, along with X 3 = X 3Q , from (1381 ) 
we have 

nRi > H(Z\Q) - H(Z\A, Y, U, X 3 , Q) 

+ H(Y\Z, A, X 3 , Q) - H(Y\A, X 3 , Q) 

> H(Z)-H(Z\A,Y,U,X 3 ) 

+ H(Y\Z,A,X 3 )-H(Y\A,X 3 ) 

= I(Z; A)+I(Z; X 3 \A)+I(Z; U\AYJc 3 ), 

where (a) follows by the fact that source Z n and side information vending machine are memo- 
ry less and since conditioning decreases entropy. Next, from d39l) . we have 

nR 2 > H(Y\Z,A 7 U,X 3 )-H(Y\Z,A,U,V,X 3 ) 

= I(Y;V\Z,A,U,X 3 ). (40) 

Moreover, by just adding X£ to the Bayesian graph in Fig. [T^nd using d-separation, it can be 
seen that Markov chains C/j — [Z^ AA — Y { and Vi — {A u Ui, Y u X 3 ) — Z { hold, which implies that 
the random variables (X, Y, Z, A, U, V, X 3 ) factorize as in (|24l) . Based on the discussion in the 
converse proof in Appendix A, it is easy to see that the estimates X\ and X 2 are functions of 
(V, X) and (U, Y), respectively. The bounds (1231) follow immediately from the discussion above 
and the constraints ©-© and (l22l) . 
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